Skip to content

feat: Mac port #183#300

Merged
ethanpailes merged 17 commits intoshell-pool:masterfrom
c22:macos
Feb 5, 2026
Merged

feat: Mac port #183#300
ethanpailes merged 17 commits intoshell-pool:masterfrom
c22:macos

Conversation

@c22
Copy link
Contributor

@c22 c22 commented Jan 29, 2026

This work is based on the previous work by @seruman in their fork https://github.com/seruman/shpool/tree/macos

I tried to retain as much of the existing work done and base my work on top of that.

seruman's fork had socket timeouts disabled on macOS. I wasn't able to reproduce socket timeouts not working on macOS but I was seeing unhandled EINVAL errors on daemon presence probe connections (where the peer closes immediately). These are now handled gracefully, so I have dropped the changes that disabled socket timeouts.

I also ran into a bug where ConfigWatcher was not using canonicalized paths, but inotify et. al. use canonicalized paths. This likely wasn't being ran into much on Linux environments but becomes very obvious on macOS because /tmp is a symlink.

Timing on macOS is indeed tricky, the fix would involve probably significantly re-working the way test are being done, but the workaround I found was to run them in serial.

I think the rest of the changes were relatively minor.

Here's a summary of the changes:

  • fix: handle EINVAL on set_read_timeout for macOS
  • test: fix integration tests for macOS
  • chore: bump shpool_pty to 0.3.2
  • fix: correct test synchronization in ConfigWatcher
  • fix: canonicalize watched paths in ConfigWatcher
  • fix: use into_owned() to avoid double allocation

and the cherry-picked commits from https://github.com/seruman/shpool/tree/macos

  • use platform agnostic watcher instead of inotify
  • add macos/bsd specific passwd fields
  • use std::env::current_exe instead of proc fs
  • add macos specific peer creds handling
  • use getpeereid instead of LOCAL_PEERCRED socket opts

@google-cla
Copy link

google-cla bot commented Jan 29, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@c22 c22 marked this pull request as ready for review January 29, 2026 06:24
@c22 c22 mentioned this pull request Jan 29, 2026
Copy link
Contributor

@ethanpailes ethanpailes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good to me. The main merge blockers are the lack of safety comments and the fact that I still have to patch this in on my macbook and run the tests.

We should also consider adding support for running the tests under macos in CI. I'm not sure how hard that will be to set up. I think it would be fair to punt on doing that until a followup.

@ethanpailes
Copy link
Contributor

I just patched this branch in on my macbook and ran the tests, including the single-threading flag. Unfortunately I saw a bunch of errors.

@c22 is this expected? I would be ok with merging even with these errors since it's an improvement over the status quo, but we'll need to add something to the README explaining that while shpool builds on macos it's not yet an officially supported platform since the test suite doesn't pass yet.

@c22
Copy link
Contributor Author

c22 commented Jan 29, 2026

@c22 is this expected?

Ah yeah sorry, I should have been more explicit about that. A bunch of integration tests failed for me as well and bounce is still probably a bit flaky on macOS even in single threaded testing. I will diff against my side later today and see if anything stands out between your results.

@ethanpailes
Copy link
Contributor

A bunch of integration tests failed for me as well and bounce is still probably a bit flaky on macOS even in single threaded testing.

Got it. Where you thinking of trying to fix them all before merge or do you want to just get something merged and then fix the remaining tests on mac in followups? Like I said, since all the linux tests are passing I'm happy to merge even with some tests broken on mac.

@c22
Copy link
Contributor Author

c22 commented Jan 30, 2026

I checked my end and the exact same tests were failing on my machine as well, so yes, that was expected.

Where you thinking of trying to fix them all before merge or do you want to just get something merged and then fix the remaining tests on mac in followups?

I was thinking of leaving them failed for now as they were also failing/skipped on seruman's builds.

With some quick-fixes and workarounds, I was able to get 8 more tests to pass (2 are related to hardcoded fish/zsh paths, 6 are related to aggressive timing, eg. 500ms on Linux -> 2500ms on macOS). 4 more tests are MOTD pager related, I didn't look into them too deeply.

5 of the tests are re-attachment failures, these are a bit more concerning to me, because re-attachment is probably a key part of my use-case on Mac. Fixing that locally required changing the way shutdown() works in shell.rs to handle ENOTCONN errors on macOS.

I'm happy for you guys to take this as-is if you like (with the unsafe comments added), but if you're receptive to another fix, I can incorporate it as part of this change as well, I just probably won't get to it until after the weekend.

@ethanpailes
Copy link
Contributor

One thing we could do is put #[cfg(not(target_os = "macos"))] on all the tests that don't pass on mac, then we could at least ensure that the tests which do pass remain passing by configuring some CI.

The motd failures are likely because the motd crate doesn't work on mac. I certainly didn't do anything to make it work nicely on mac. Does mac even have a motd? We might want to just disable that feature for macs. It's only really useful in institutional settings where the motd has useful information. It's pretty useful within google, but I doubt I would have added the feature if it were not for google's active use of motd. I rather doubt people care about that feature too much in most other places.

I'm happy for you guys to take this as-is if you like (with the unsafe comments added), but if you're receptive to another fix, I can incorporate it as part of this change as well, I just probably won't get to it until after the weekend.

Waiting a few days for at least the simple fixes seems worthwhile. Why don't you do those fixes, but don't break your back. If you can get the re-attachment fixes in those seem like they would be pretty nice to have. Just disable the motd and anything else that is too hard. Then we can follow up with some CI to keep things up to date and add a section to the README explaining the state of macos support.

c22 added 4 commits February 5, 2026 17:42
File system watchers (inotify, FSEvents) report canonical paths, but ConfigWatcher was storing paths as provided by the caller. When a path contained symlinks (e.g., /var -> /private/var on macOS), event paths wouldn't match stored paths, causing config changes to be missed.

Add canonicalize_path() to resolve symlinks in the existing portion of a path (handling paths where the final components don't exist yet), and apply it when storing paths in the watcher.

Include a regression test that creates a symlink and verifies events are received when watching through the symlink.
The worker_ready() test helper signals when the worker is idle, but it was signaling even when there was a pending reload_deadline. This could cause tests to proceed before the debounce timeout fired, leading to flaky test results.

Only signal idle when there's no pending work (reload_deadline is None).

Update the debounce test to not call worker_ready() between writes, as that now correctly waits for the reload to complete.
This version includes macOS support via ptsname (vs ptsname_r).

See: shell-pool/shpool_pty#4
- Forward test_hooks feature from shpool to libshpool crate
- Shorten socket paths to avoid macOS 104-byte sun_path limit
- Forward HOME and PATH env vars to attach subprocess (macOS typically lacks XDG_RUNTIME_DIR, needs HOME instead)
- Use std::env::var directly to avoid unused import warning
- Document that tests should run with --test-threads=1 on macOS due to FSEvents timing behavior under concurrent load
@c22
Copy link
Contributor Author

c22 commented Feb 5, 2026

That should be most of it..

I believe the pager/motd tests are not easily fixed without some changes to shpool_pty but I can't remember what my findings were on that, I should have written them down. Most of the other issues were around race conditions and fixing them for macOS should hopefully prove beneficial to Linux test stability as well (particularly when under load or on slower machines). I didn't feel like arbitrarily increasing the wait times for the other tests, because it would increase the test times on Linux and CI/CD environments. Re-working the test invocation in to something a bit more generic (event based?) is probably a bigger task.

Either way, hopefully this proves to be at least somewhat usable.

c22 added 6 commits February 5, 2026 18:01
macOS returns EINVAL when setting socket timeout on a connection where the peer has already closed (documented in setsockopt(2)).

This typically happens with daemon presence probe connections.
On macOS, calling shutdown() on a Unix socket after the peer has disconnected returns ENOTCONN (error 57). This was causing the shell->client thread to exit with an error, which then caused re-attachment to fail.

Add a shutdown_socket() helper that ignores ENOTCONN errors since the socket is already closed in that case.
Add worker_ready() synchronization before the rename operation, matching the pattern used in other config_watcher tests. Without this, the filesystem event from the rename could occur before the watcher is fully ready to receive it, causing flaky test failures on macOS where FSEvents timing differs from Linux's inotify.
* Wait for daemon-wrote-s2c-chunk event before sending the large command

This avoids a race where the command is sent before the shell is ready to receive input.

* Silence macOS bash deprecation warning via BASH_SILENCE_DEPRECATION_WARNING env var

The warning contains 'd' which caused read_until(b'd') to stop early before reaching "food" in the restore buffer.
Skip 6 tests that fail on macOS due to platform differences:

- prompt_prefix_zsh, prompt_prefix_fish: hard-coded Linux shell paths
- motd_pager, motd_debounced_pager_*, motd_env_test_pager_*: PTY pager output doesn't reach client on macOS

Update HACKING.md to document skipped tests.
[env]
PS1 = "prompt> "
TERM = ""
BASH_SILENCE_DEPRECATION_WARNING = "1"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you guys want a comment explaining this? It's explained in my commit message.

Alternatively we can stop on a character that is not in the deprecation warning...

There is probably a more general solution to this like skipping shell integration tests when they are not installed or on the right version or something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah a comment directly in the config file would be good. I think the way you have it is better than carefully choosing test data to avoid the conflict because it is clearer what is going on. I doubt anyone would want to change the test data in the future, but if they ever did a carefully choosen test string would be fragile in a non-obvious way.

@c22
Copy link
Contributor Author

c22 commented Feb 5, 2026

Also regarding the shell integration tests that get skipped (zsh and fish), they are only really a problem on macOS because the paths are hardcoded. This would also be a problem on other *nix environments where the shells aren't where the tests expect them to be (the tests pass on macOS if you brew install fish and then update the paths accordingly.

We could possibly use something like https://crates.io/crates/which to find the shell paths dynamically, but it introduces a dev dependency. Perhaps something for the future.

@ethanpailes
Copy link
Contributor

We could possibly use something like https://crates.io/crates/which to find the shell paths dynamically, but it introduces a dev dependency. Perhaps something for the future.

Yeah this definitely seems like something that should be solveable with a little programming. I'm happy to punt on it for now though.

Copy link
Contributor

@ethanpailes ethanpailes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I patched this in locally on my macbook and while there is some flakyness all the tests pass. I'm going to go ahead and merge this.

Thank you so much for the work on this port @c22 and @seruman!

[env]
PS1 = "prompt> "
TERM = ""
BASH_SILENCE_DEPRECATION_WARNING = "1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah a comment directly in the config file would be good. I think the way you have it is better than carefully choosing test data to avoid the conflict because it is clearer what is going on. I doubt anyone would want to change the test data in the future, but if they ever did a carefully choosen test string would be fragile in a non-obvious way.

@ethanpailes ethanpailes merged commit 7030966 into shell-pool:master Feb 5, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants